83 research outputs found

    Robust joint and individual variance explained

    Get PDF
    Discovering the common (joint) and individual subspaces is crucial for analysis of multiple data sets, including multi-view and multi-modal data. Several statistical machine learning methods have been developed for discovering the common features across multiple data sets. The most well studied family of the methods is that of Canonical Correlation Analysis (CCA) and its variants. Even though the CCA is a powerful tool, it has several drawbacks that render its application challenging for computer vision applications. That is, it discovers only common features and not individual ones, and it is sensitive to gross errors present in visual data. Recently, efforts have been made in order to develop methods that discover individual and common components. Nevertheless, these methods are mainly applicable in two sets of data. In this paper, we investigate the use of a recently proposed statistical method, the so-called Joint and Individual Variance Explained (JIVE) method, for the recovery of joint and individual components in an arbitrary number of data sets. Since, the JIVE is not robust to gross errors, we propose alternatives, which are both robust to non-Gaussian noise of large magnitude, as well as able to automatically find the rank of the individual components. We demonstrate the effectiveness of the proposed approach to two computer vision applications, namely facial expression synthesis and face age progression in-the-wild

    Face flow

    Get PDF
    In this paper, we propose a method for the robust and efficient computation of multi-frame optical flow in an expressive sequence of facial images. We formulate a novel energy minimisation problem for establishing dense correspondences between a neutral template and every frame of a sequence. We exploit the highly correlated nature of human expressions by representing dense facial motion using a deformation basis. Furthermore, we exploit the even higher correlation between deformations in a given input sequence by imposing a low-rank prior on the coefficients of the deformation basis, yielding temporally consistent optical flow. Our proposed model-based formulation, in conjunction with the inverse compositional strategy and low-rank matrix optimisation that we adopt, leads to a highly efficient algorithm for calculating facial flow. As experimental evaluation, we show quantitative experiments on a challenging novel benchmark of face sequences, with dense ground truth optical flow provided by motion capture data. We also provide qualitative results on a real sequence displaying fast motion and occlusions. Extensive quantitative and qualitative comparisons demonstrate that the proposed method outperforms state-of-the-art optical flow and dense non-rigid registration techniques, whilst running an order of magnitude faster

    Locality-preserving Directions for Interpreting the Latent Space of Satellite Image GANs

    Full text link
    We present a locality-aware method for interpreting the latent space of wavelet-based Generative Adversarial Networks (GANs), that can well capture the large spatial and spectral variability that is characteristic to satellite imagery. By focusing on preserving locality, the proposed method is able to decompose the weight-space of pre-trained GANs and recover interpretable directions that correspond to high-level semantic concepts (such as urbanization, structure density, flora presence) - that can subsequently be used for guided synthesis of satellite imagery. In contrast to typically used approaches that focus on capturing the variability of the weight-space in a reduced dimensionality space (i.e., based on Principal Component Analysis, PCA), we show that preserving locality leads to vectors with different angles, that are more robust to artifacts and can better preserve class information. Via a set of quantitative and qualitative examples, we further show that the proposed approach can outperform both baseline geometric augmentations, as well as global, PCA-based approaches for data synthesis in the context of data augmentation for satellite scene classification

    Parts of Speech-Grounded Subspaces in Vision-Language Models

    Full text link
    Latent image representations arising from vision-language models have proved immensely useful for a variety of downstream tasks. However, their utility is limited by their entanglement with respect to different visual attributes. For instance, recent work has shown that CLIP image representations are often biased toward specific visual properties (such as objects or actions) in an unpredictable manner. In this paper, we propose to separate representations of the different visual modalities in CLIP's joint vision-language space by leveraging the association between parts of speech and specific visual modes of variation (e.g. nouns relate to objects, adjectives describe appearance). This is achieved by formulating an appropriate component analysis model that learns subspaces capturing variability corresponding to a specific part of speech, while jointly minimising variability to the rest. Such a subspace yields disentangled representations of the different visual properties of an image or text in closed form while respecting the underlying geometry of the manifold on which the representations lie. What's more, we show the proposed model additionally facilitates learning subspaces corresponding to specific visual appearances (e.g. artists' painting styles), which enables the selective removal of entire visual themes from CLIP-based text-to-image synthesis. We validate the model both qualitatively, by visualising the subspace projections with a text-to-image model and by preventing the imitation of artists' styles, and quantitatively, through class invariance metrics and improvements to baseline zero-shot classification.Comment: Accepted at NeurIPS 202

    Comparative studies on energy efficiency and GHG emissions between conventional and organic olive groves in Greece and Portugal

    Get PDF
    Nowadays, traditional farming based on achieving high yields using high inputs, shifts gradu-ally towards maximum possible crop yield using minimal inputs in an optimized way or to-wards organic farming. This is usually accomplishing by low yield of high quality products without using conventional agrochemicals (i.e. fertilizers, pesticides). In general, this last ap-proach leads to lower energy consumption per unit area of land, therefore lower cost and reduced greenhouse gas (GHG) emissions. However, in a global perspective it has the risk of significant total production reduction. Hence, it is vital to consider energy efficiency im-provement, namely the decrease of primary energy consumption for the production of a unit of agricultural product (expressed in weight or volume units), within the farm boundaries. Im-provement of energy efficiency is a key parameter affecting positively the overall efficiency of crop farming systems in terms of energy and GHG emissions. In the present paper, two show cases of olive groves in Greece (“Sterea Ellada” region) and Portugal (“Alentejo” re-gion) were compared to illustrate the effect on energy efficiency and GHG emissions when moving from conventional to organic olive grove cultivations in these different locations. The analysis was based on two simple framework models using information provided by farmers and literature data regarding the inputs and outputs of each olive grove. The models were adjusted according to the olives’ variety, the agricultural practices followed and the location of the production system. Considering the specific energy consumption per unit of product, in the case of the Greek olive grove, organic farming reduces energy consumption by 13.9%, while the final yield is reduced by 30%. GHG emissions are reduced by 58%. In the case of the Portuguese olive grove, organic farming significantly reduces crop yield (54.5%), while, energy efficiency is improved by 9.7% and GHG emissions are reduced by 26%

    3D Reconstruction of 'In-the-Wild' Faces in Images and Videos

    Get PDF
    This is the author accepted manuscript. The final version is available from IEEE via the DOI in this record 3D Morphable Models (3DMMs) are powerful statistical models of 3D facial shape and texture, and are among the state-of-the-art methods for reconstructing facial shape from single images. With the advent of new 3D sensors, many 3D facial datasets have been collected containing both neutral as well as expressive faces. However, all datasets are captured under controlled conditions. Thus, even though powerful 3D facial shape models can be learnt from such data, it is difficult to build statistical texture models that are sufficient to reconstruct faces captured in unconstrained conditions ('in-the-wild'). In this paper, we propose the first 'in-the-wild' 3DMM by combining a statistical model of facial identity and expression shape with an 'in-the-wild' texture model. We show that such an approach allows for the development of a greatly simplified fitting procedure for images and videos, as there is no need to optimise with regards to the illumination parameters. We have collected three new benchmarks that combine 'in-the-wild' images and video with ground truth 3D facial geometry, the first of their kind, and report extensive quantitative evaluations using them that demonstrate our method is state-of-the-art.Engineering and Physical Sciences Research Council (EPSRC

    TensorLy: tensor learning in Python

    Get PDF
    Tensors are higher-order extensions of matrices. While matrix methods form the cornerstone of traditional machine learning and data analysis, tensor methods have been gaining increasing traction. However, software support for tensor operations is not on the same footing. In order to bridge this gap, we have developed TensorLy, a Python library that provides a high-level API for tensor methods and deep tensorized neural networks. TensorLy aims to follow the same standards adopted by the main projects of the Python scientific community, and to seamlessly integrate with them. Its BSD license makes it suitable for both academic and commercial applications. TensorLy's backend system allows users to perform computations with several libraries such as NumPy or PyTorch to name but a few. They can be scaled on multiple CPU or GPU machines. In addition, using the deep-learning frameworks as backend allows to easily design and train deep tensorized neural networks. TensorLy is available at https://github.com/tensorly/tensorl

    PandA: Unsupervised Learning of Parts and Appearances in the Feature Maps of GANs

    Get PDF
    Recent advances in the understanding of Generative Adversarial Networks (GANs) have led to remarkable progress in visual editing and synthesis tasks, capitalizing on the rich semantics that are embedded in the latent spaces of pre-trained GANs. However, existing methods are often tailored to specific GAN architectures and are limited to either discovering global semantic directions that do not facilitate localized control, or require some form of supervision through manually provided regions or segmentation masks. In this light, we present an architecture-agnostic approach that jointly discovers factors representing spatial parts and their appearances in an entirely unsupervised fashion. These factors are obtained by applying a semi-nonnegative tensor factorization on the feature maps, which in turn enables context-aware local image editing with pixel-level control. In addition, we show that the discovered appearance factors correspond to saliency maps that localize concepts of interest, without using any labels. Experiments on a wide range of GAN architectures and datasets show that, in comparison to the state of the art, our method is far more efficient in terms of training time and, most importantly, provides much more accurate localized control
    • …
    corecore